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Text-to-speech (TTS) is a technology that converts text into sound using a 
phonetization system and is especially useful to be applied to blind aids who 
need information in the form of sound because of their limitations. To help 
the visually impaired know their health conditions where the visually 
impaired feel limited to going out during the pandemic. For this reason, it is 
necessary to make an aid that can read text-based data such as body 
temperature, heart rate per minute, and oxygen levels into a voice that can be 
heard by the blind, the method used in this study is finite state automata 
(FSA) which is used to split Indonesian words into words according to its 
syllable patterns and facilitate the pronunciation process which is included in 
the blind aids so that it is expected to help the visually impaired to be able to 
find out their health condition. In this study, the test was carried out using 
the confusion matrix method and the results obtained were 100% accurate, 
99.71% accuracy of temperature sensor, 98% accuracy of heart rate, and 


95% accuracy of oxygen saturation. 
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1. INTRODUCTION 

The development of knowledge and technology is making human work relatively easier because the 
technology created comes from the background of problems and anxiety that arise from the users themselves. 
Computer science and engineering that exists today have provided many benefits that can be applied in our 
lives, one of which we can see is in the field of automata theory, one of the largest areas related to the 
efficiency of an algorithm in solving problems in computational models. In recent years, many researchers 
have created and developed applications to help people who have visual impairments when they want to 
communicate with other people, one of which is technology from text-to-speech (TTS). The technology is 
converting text into sound using a phonetization system, that is phonemes that are arranged to form an 
utterance [1], the ability today has been quite good in converting text into sound [2]. The purpose of this 
technology is to make computers able to communicate and interact with everyday spoken language. The use 
of TTS today has been widely practiced by researchers, such as to recognize Vietnamese directly [3] or 
through a voice Bot [4], recognizing Japanese and English [1], the use of this TTS can also be used to 
conduct interviews [5], used in agriculture [6], and also helpful for people with aphasia [7]. In the 
configuration process, the words are split into syllables using Indonesian language rules with the aim that the 
resulting sound can be heard in the Indonesian language. 
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Researchers have done a lot of research that makes the visually impaired the object of their research, 
but most research done is more on walking aids for the blind using computer vision [8], [9], GPS [10], [11], 
and ultrasonic [12], [13]. However, there is only some research that discusses how the health condition of a 
blind person can be conveyed from text data into voice to blind people. This is considered very useful, as 
evidenced by research conducted that people with disabilities in reading are greatly helped by the presence of 
TTS applications [14]. 

Many methods are widely used for text processing, the one that is widely used is the finite state 
automata (FSA) method. Researchers have used this method to detect syllables in Indonesian language [15] 
to read the prefix of a word [16], read and translate Indonesian language to the Madurese language [17], and 
Latin to Sundanese [18]. The method is not only used to help translate text but can also be used to make 
predictions on time series data [19], verification tests [20], reading document similarity [21], and can even 
help to secure IP [22]. 

It is proven that this method has sufficient accuracy with a value of more than 80% [17], [18], [23], 
however, it still has limitations where it only processes text but not many convert it into speech, and the data 
used does not yet use health data, such as body temperature, heart rate, and oxygen levels. Some additional 
time information in the form of date, month, year, hour, and minute are also not added there. This will 
certainly be very useful if applied to a tool that is used by blind people to facilitate their accessibility in 
knowing their health condition. The rest of this paper is structured as: the literature review, which outlines the 
idea behind the study approach research strategy, is presented in section 2, section 3 presents and discusses the 
experiments results, and in section 4 conclusions and work-related suggestions round up the section. 


2. METHOD 
2.1. Text-to-speech 

A TTS or commonly called TTS synthesis is a computer-based system that can read text aloud 
automatically, whether the text is introduced by a computer input stream or a scanned input that is sent to 
optical character recognition (OCR) machine. Speech synthesizers can be implemented by both hardware and 
software and have made very rapid improvements over the decades and many high-quality TTS systems are 
now available for commercial use. Speech which is often based on natural speech sequences i.e., units taken 
from natural speech and put together to form words or sentences of simultaneous speech synthesis, the latter 
has become very popular in recent years due to its increased sensitivity to the context of the unit over its 
simpler predecessors. Rhythm is an important factor in making speech synthesized from the TTS system to 
be more natural and understandable, the prosodic structure provides important information to produce a 
prosodic generation model that is the effect in the synthesized speech. Many TTS systems were developed 
based on the principle of corpus-based speech synthesis due to the natural sound output, high quality, and 
very popular. 


2.2. Finite state automata 

As an abstract mathematical concept that describes the behavior of a logical machine that explains 
the workings of a physical machine, a program, an algorithm, or a problem-solving conception. In the context 
of language theory, the FSA engine can be applied to recognize a string that comes from a regular language 
that is generated from regular grammar. Thus, there is a reciprocal relationship between a regular language 
and FSA, that is, if it is owned by a regular language, a language can be constructed by FSA machine, then if 
it is owned by an FSA, a language will be derived and can be recognized by the machine. 

The finite state machine can be a machine that has no output. A finite state machine that does not 
issue this output is known as a FSA [24]. In FSA the machine is initially in state SO and receives a series of 
inputs which can change it to the next states. In FSA, there is also a certain set of states known as the final 
state. Changes from one state to the next follow certain rules that are formulated as a transition function. 

FSA is an automatic machine of regular language. An FSA has a finite number of states and can 
move from one state to another [25]. This state change is represented by a transition function. The FSA has 
no storage space, so the ability to 'remember' is limited, it can only remember the most recent state. Examples 
of FSAs include elevators, text editors, lexical analysis, network communication protocols, and parity checks. 
Formally the FSA can be defined as TUPLE-5: a collection of five sets, or annotated as: 


FSA is M = S,2,6,S0,and F 
Where S is a finite set of states, Z is a finite set of symbols on the machine, and d=Qxz is a transition 


function that governs the movement of the machine. Among them is a function that takes states and an input 
alphabet as arguments and returns a state. SO is the initial state and F is the set of a final state. 
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The behavior of FSA is expressed in the form of a transition table or the form of a transition 
diagram. Table 1 shows the example of syllable breaking from FSA itself. The transition function in the 
transition table is shown in Table 1, where from the transition table can be described the FSA transition 
diagram in Figure 1. 


Table 1. Transition 


6 
State Input 
0 1 
SO SO SO 
S1 S1 S1 
S2 S2 S2 
S3 S3 S3 


On 1 


Start > Q 1? 


Ks 


Figure 1. Example of a transition diagram 


2.3. Research framework 

The framework of thought in the research is divided into four parts: the first is data input, the second 
part is the process, the third is the output produced, and the last is the measurement. For the first, there are 
five data inputs: first the date and time data, heart rate, temperature data, oxygen saturation data which will 
then be processed by the FSA method, and next there is voice recording data that will be used after the finite 
state process automata are translated into sound. The third part is the output generated in the form of voice 
date and time, as well as the sound of heart rate, temperature, and oxygen saturation data. The last part is the 
testing section where testing is carried out using a confusion matrix to find out how accurate the FSA method 
is in recognizing and processing input. It is necessary to know the data obtained can come from sensor input 
stored in the database. Details of the research framework can be seen in Figure 2. 


PROCESS 


OUTPUT 


MEASUREMENT 


Date and 
Time Data 


Voice Based Date 
Information 
Heart Rate, 
and 
Temperature 
Data 
FINITE STATE 
AUTOMATA Voice Based Time Confussion Matrix 
Information 
Oxygen 
Saturation 
Data 


Voice Based 
Temperature, Heart 
Rate and Oxygen 


Voice Text-to-Speech Saturation 
Recording Information 
Database 


Figure 2. Research framework 
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2.4. Application of finite state automata 

The FSA method is used to separate the words that are inputted and will be sent to TTS to be 
processed and output with sound results. Figure 3 shows the FSA method consists of four stages that need to 
be known before processing with FSA. Table 2 shows the text normalization step, in this step, every sentence 
text containing numbers, currency units, symbols, time, date, temperature, units, and abbreviations will be 
carried out in the text normalization process first. 


Text normalization Consonant recognition Wore Classen The resulting syllable 
and fragmentation 


Figure 3. FSA stage 


Table 2. Normalization 


Text Normalization result 
0 Nol 
1 Satu 
2 Dua 
3 Tiga 
4 Empat 
5 Lima 
6 Enam 
7 Tujuh 
8 Delapan 
9 Sembilan 
10 Sepuluh 
N>19 & <100 Puluh 
11 Sebelas 
N>99 & <1000 Ratus 
100 Seratus 
N>999 & <10000 Ribu 
1,000 Seribu 
°C Derajat celsius 
% Persen 


Consonant recognition, the next step performs the process to recognize letters after there is text input 
given from the tool. Besides letter recognition, there will also be an introduction to space punctuation marks. 
The letters B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, and Z will be recognized as "K" or 
consonant. The letters A, I, U, E, and O will be recognized as "V" or a vowel. As the letters N, Y, and G will 
be recognized as the letter itself, those are: N as "N", Y as "Y", and "G" as "G". This arrangement will later 
aim to facilitate the classification of syllables if later in the reading of the text there are consecutive 
consonants presents. 

Word classification and fragmentation, in this step, perform the classification and fragmentation of 
words described in the form of a transition diagram designed in three levels. At the first level that is 
recognized is the pattern: V, K, or KV. The results of the first level itself will be a continuation to the next 
level of FSA. Next is the level of the FSA transition diagram process, which is the second level which will 
recognize syllables with the pattern V, VK, VKK, KV, KVK, KKV, KKVK, KKKV, KKKVK for all 
consonants other than n, k, and s. Then, at the next level, the third level, it was explained that the syllable 
pattern of VKK, KVKK, KKVKK could not be recognized at the previous level. Therefore, the third-tier 
FSA can recognize these syllables. 

The process of three transition diagrams above explains how the process of producing 
twelve-syllable/phoneme classifications from words that have been cut off according to Indonesian 
sentences. The goal is to recognize syllables in Indonesian sentences, by recognizing syllables in spoken 
language, it can be implemented into TTS. Figure 4 shows the design of the device from the research. There 
are two sensors used in this device, i.e.: 1) pulse oximeter max 30100 sensors used to measure oxygen levels 
and heart rate and ii) DS1B20 sensor to detect temperature. The input obtained from the sensors will be 
processed using the FSA that is already embedded in the Raspberry. The FSA will be triggered when the 
button is pressed. The device also has a microSD which is used to store sound snippet data. 
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Real Time Clock 
RTC DS3231 


4> Push Button 


Pulse Oximeter Sensor 
Max30100 


7 


MICROSD 


Speaker 


Powerbank 


A— e] 


Internet Server Software Application 


Figure 4. Tool design 


Figure 5 explains the speech formation block diagram, after the button is pressed and the sensors have 
successfully read the body data for approximately 1 minute, the data will be processed using the FSA, and the data 
read will be normalized into text sequentially. Then, the FSA stage is executed, the sound file that represents the 
text is searched on the media storage in the microSD, and the sound will be issued through the speaker. 


Pieces of sound recordings (ex: 
SE.mp3, KA.mp3, RANG.mp3, 
and Jam.mp3) 


Diphone Database 


Phoneme sequence, Diphone 
duration, and pitch Concatenation Engine npor 
oice 


Figure 5. Speech formation block diagram 


For the pronunciation to be pronounced all the words or sentences that have been compiled, there is 
a diphone database in which there is all the diphone data as in Figure 5. After that, through the diphone 
concatenation engine or processing unit, it will receive input in the form of a sequence of phonemes that will 
be pronounced along with the duration of pronunciation, pitch, or frequency. Then the smoothing of the 
connection between the diphones will be carried out, manipulating the duration of pronunciation, and pitch. 
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3. RESULTS AND DISCUSSION 
3.1. FSA results 

The results of the FSA in this study are divided into three parts: the first part produces the sound of 
the date, the second produces the sound of time, and the third produces the sound of temperature, oxygen 
saturation, and heart rate. 


3.1.1. Date voice 
Please note that the FSA works as follows: 


FSA is M = (S, X, ô, S0,and F) 


where: 

S: {S0, S1, S2, S3, S4, S5} 

x: {blank/space, KV, V, KVK, KVKK} 

ô: (SO, blank/space)=S1, (SO, KV)=S2, (SO, V)=S3, (SO, KVK)=S4, (S4, K)=S5 
SO: initial state 

F: {S0, S5} 

The sounds that are processed in Figure 6, an example of the output that will come out is “HARI INI 
TANGGAL TIGA BULAN AGUSTUS TAHUN DUA RIBU DUA PULUH SATU”. This sentence has been 
normalized in the form of numbers according to the previous step. If the word fragments described in the 
above process are described in "HA-RI", "I-NI", "DATE-GAL", "TI-GA", "BU-LAN", "A-GUS-TUS", 
"TA-HUN”, “DU-A”, “RI-BU”, “DU-A”, “PU-LUH”, “SA-TU”, in which each word will be separated 
according to the Indonesian language rules in the previous step. 


Blank/Space 


© 


Blank/Space 


/ 
aol 
5 


Anotation 
SO : Initial state 
S1 : Recognize blank/space 


S2 : Recognize KV 

$3 : Recognize V 

S4 : Recognize KVK K 
S5 : Recognize KVKK 


Figure 6. Date FSA transition diagram 


© 


3.1.2. Time voice 

In addition to the application on date data, the next one is applied to time data, known: 
S: {S0, S1, S2, $3, S4, S5} 

È: {space/blank, KV, KVK, KVKK, V} 

5: (SO, blank/space)=S1, (SO, KV)=S2, (SO, KVK)=S3, (S3, K)=S4, (SO, V)=S5 

SO: initial state 

F: {S0, S5} 

The sounds that are processed in Figure 7, an example of the output that will come out is 
“SEKARANG JAM DUA BELAS NOL NOL”. This sentence has been normalized in the form of numbers 
according to the previous step. If the decapitation is described in the above process, "SE-KARANG", "JAM", 
"DU-A", "BE-LAS", "NOL", "NOL". Each word will be separated according to the Indonesian language rules 
in the previous step. 
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Space 


SO : Initial state 

S1 : Recognize blank/space 
S2 : Recognize KV 

S3 : Recognize KVK 

S4 : Recognize KVKK 

S5 : Recognize V 


© 


Figure 7. Time FSA transition diagram 


3.1.3. Health data voice 

Next step FSA applied to health data, known: 

S: {S0, IS1, S2, S3, S4, S5, S6,} 

x: {space/blank, KV, VK, V, KVK, KVKK} 

5: (SO, blank/space)=S1, (SO, KV)=S2, (SO, KVK)=S3, (S3, V)=S4, (S4, K)=S5, (S3, K)=S6 
SO: initial state 

F: {S0, S6} 

The sound that is processed in Figure 8, an example of the output that will come out is “SUHU 

BADAN TIGA PULUH ENAM DERAJAT CELSIUS DETAK JANTUNG DELAPAN PULUH ENAM BIT PER 
MENIT KADAR OKSIGEN SEMBILAN PULUH PERSEN”. This sentence has been normalized in the form 
of the appropriate number previous step. If the decapitation is described in the above process "SU-HU, 
"BA-DAN", "TI-GA", "PU-LUH", "E-NAM", "DERA-JAT", "CEL-SI-US", “DE-TAK”, “JAN-TUNG”, 
“DE-LA-PAN”, “PU-LUH”, “E-NAM”, “BIT”, “PER”, “ME-NIT”, “KA-DAR”, “OK-SI-GEN”, 
“SEM-BI-LAN”, “PU-LUH”, “PER-SEN’, in which each word will be separated according to the Indonesian 
language rules in the previous step. 


Blank/Space 


/ 


Blank/Space 
Tone 
KVK. 
Anotation M 
SO : Initial state 


S1 : Recognize blank/space 
S2 : Recognize KV 


S3 : Recognize KVK 
S4 : Recognize V 

S5 : Recognize VK 

S5 : Recognize KVKK 


Figure 8. Health data FSA transition diagram 


3.2. Hardware result 
This section tries to show the results of the tools made, to ensure that the tool does not slide out of 


the hands of impaired people, the tool is glued to the wrist of impaired people, while the details of the tool 
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can be seen in Figure 9 shows how the tool is made and how to use the tool by clamping the max 30100 
sensors on the thumb of a visually impaired person for approximately 1 minute to get accurate data for 


analysis. 


Figure 9. The final result of the created tool 


3.3. Measurement 

Testing was performed fifteen times for each syllable, for date, time, or health data as in Table 3. 
Table 4 shows the test results that show the accuracy of the test obtained a value of 100% for each data read 
by the sensor providing the appropriate beheading and giving the appropriate sound. The results of tests 
performed on the sensor are shown in Figures 10-12. Figure 10 shows the accuracy test of the temperature 
sensor which is compared to the tools that are already sold freely on the market. The average margin value is 
0.29 the results show that the sensor accuracy level reaches 99.71% indicating that the sensor is very good at 
reading the body temperature of the visually impaired. 

Another good thing that can be shown in Figures 11 and 12 is that our sensor was compared with a 
fingertip oximeter by testing 82 times, while the results show the average error is at 2% accuracy to measure 
heart rate, and 5% for the average error in testing oxygen levels, so it can be said that the tool made has a 
fairly good accuracy compared to tools that have been sold in the market today. To get the best accuracy 
value for the sensor, the testings were done by comparing the placement position of the sensor, whether on 
the finger, by placing it on the thumb or the pointing finger, or by placing it on the wrist, and the average 
heart rate difference is 9.85 beats per minute, with an accuracy of 88.80%, and the average difference in 
oxygen levels is obtained from 9.46 with an accuracy of 89.78%, the results on the finger show a better 
value. 


Table 3. FSA testing 


Push button Text Classification word K-V Beheading words 1 2 3 4 15 
1x HARI INI TANGGAL TIGA KVKV VKV KVKKKVK KV-KV V-KV KVKK- V V vV vv Vv 
BULAN AGUSTUS TAHUN KVKV KVKVK VKVKKVK KVK KV-KV KV-KVK 
DUA RIBU DUA PULUH KVKVK KVV KVKV KVV V-KVK-KVK KVKVK 
SATU KVKVK KVKV KV-V KV-KV KV-V 
KV-KVK KV-KV 
2x SEKARANG JAM DUA KVKVKVKK KVK KVV KV-KV-KVKK KVK KV- y V y 
BELAS NOL NOL KVKVK KVK KVK V KVKVK KVK KVK 
3x SUHU BADAN TIGA KVKV KVKVK KVKV KV-KV KV-KVK KV- y y y 
PULUH ENAM KVKVK VKVK KV KV-KVK V- 
DERAJAT CELSIUS KVKVKVK KVKKVVK KVK KV-KVKVK 
DETAK JANTUNG KVKVK KVKKVKK KVK-KVVK KV-KVK 
DELAPAN PULUH KVKVKVK KVKVK KVK-KVKK KV-ĶV- 
ENAM BIT PER MENIT VKVK KVK KVK KVK KVKVK V-KVK 
KADAR OKSIGEN KVKVK KVKVK KVK KVK KV- 
SEMBILAN PULUH KKVKVK KVKKVKVK KVK KV-KVK VK-KV- 
PERSEN ANDA SEHAT KVKVK KVKKVK VKKV KVK KVK-KV-KVK 
KVKVK KV-KVK KVK-KVK 
VK-KV KV- 
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Actual/prediction class TP TN FP FN 
One-time press 180 0 0 0 
Double press 90 0 0 0 
Three presses 330 0 0 0 
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Temperature Margin Value Between Tool and Sensor Heart Rate Margin Value Between Tool and Sensor 
70 100% 50 100% 
90% 45 90% 
5 ®© 80% S 40 oe 
3 v o 
5 50 70% F g 35 70% $ 
? o Šg 3 30 ox S 
g 25 $ 25 50% &%Ë 
G 7 SE w £5 
5 30 PA Eg ° 20 40% S 3 
3 n 838 2 15 30% 5 á 
E 20 W tt 3 10 20% 
2 i0 20% 5 10% 
10% o 0% 
0 0% (2%, 4%] (4%, 6%] (16%, 18%] (12%, 14%] 
[0, 0.36] (0.36, 0.72](0.72, 1.08] (1.8, 2.16] (1.44, 1.8] (1.08, 1.44] [0%, 2%] (6%, 8%] (8%, 10%] (10%, 12%] (14%, 16%] 
Absolute Error Absolute error 
Figure 10. Temperature sensor accuracy test results Figure 11. Heart rate sensor accuracy test results 


Oxygen Saturation Value Between Tool and Sensor 


5 
a 70% 
3 20 60% 
2 50% 
15 40% 
2 oy 
= 10 30% 
= À 
2 20% 
5 
10% 
(o) 0% 


[0%, 3%] (6%, 8%] (3%, 6%] (8%, 11%] (11%, 14%] 


Absolute error 


Figure 12. Oxygen saturation sensor accuracy test results 


4. CONCLUSION 

The use of the FSA method applied to TTS in terms of processing to recognizing or capturing and 
cutting words into syllable patterns according to Indonesian rules can be an alternative to the pronunciation 
process of the blind vision tool. Furthermore, the FSA can read any normalized symbol and input text given 
from the tool or web application. The application of FSA on the blind vision wrist tool in terms of cutting 
words into syllables has been successful and tested using the confusion matrix to get appropriate results both 
from absolute and not absolute words, by normalizing text changing numbers and symbols into the form of 
text, the introduction of vowel consonants in words, classifying and splitting words, and getting syllable 
results based on the processed FSA transition diagram. 

By testing the FSA method using the confusion matrix to find the results of the accuracy values in 
the method, the accuracy rate is 100%. After testing the difference and accuracy of the tool as many as 84 
tries within 3 days 12 hours were tested every hour. In the RTC DS3231 component with a mobile phone 
comparator, the author gets the result of the difference between push button 1x press and the comparison is 0, 
with an accuracy of 100%. When pressed twice, the comparison is a difference of 0.30 seconds with an 
accuracy of 99.97%. On the DS18B20 sensor with the infrared thermometer comparison results get a 
difference of 0.29 °C, accuracy of 99.71%. The max 30100 sensor with fingertip oximeter comparison results 
obtained a heart rate difference of 2.02 beats per minute and an accuracy of 99.76%. The result of the 
difference in oxygen levels of 4.79% and an accuracy of 98.43%. 

The test on the max 30100 sensor with a comparison when placed on the fingers and wrists got the 
average result of the difference in heart rate of 9.85 beats per minute, the accuracy of 88.80%, and the average 
result of the difference in oxygen levels of 9.46%, accuracy of 89.78%. This method needs to be tested again on 
data with larger and more varied sizes to ensure the level of accuracy so that it can be known to what extent the 
accuracy of the FSA method itself is. For this reason, it is necessary to conduct further research by adding other 
sensors such as to measure a person's stress level and sleep quality as well as other measurements. 
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